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ABSTRACT 

We address the problem of reconstructing and analyzing 
surveillance videos using compressive sensing. We develop 
a new method that performs video reconstruction by low 
rank and sparse decomposition adaptively. Background sub- 
traction becomes part of the reconstruction. In our method, a 
background model is used in which the background is learned 
adaptively as the compressive measurements are processed. 
The adaptive method has low latency, and is more robust than 
previous methods. We will present experimental results to 
demonstrate the advantages of the proposed method. 

Index Terms — Compressive sensing, low rank and 
sparse decomposition, background subtraction 

1. INTRODUCTION 

In video surveillance, video signals are captured by cameras 
and transmitted to a processing center, where the videos are 
monitored and analyzed. Given a large number of cameras 
installed in public places, an enormous amount of data are 
generated and need to be transmitted in the network, raising a 
high risk of network congestion. Therefore, it is highly desir- 
able to compress the video signals transmitted in the network. 

The recently introduced compressive sensing theory 
proves that if a signal has a sparse representation in some 
basis, then it can be reconstructed from a small set of lin- 
ear measurements LUL2J. The number of measurements can 
be much smaller than that required by Nyquist sampling 
rate. Since videos are known to have a sparse representa- 
tion in some transform basis (e.g. total variation, wavelet or 
framelet, etc.), the compressive sensing theory can be applied 
to compress video at the cameras, for example to acquire 
video by compressive measurements which can then be used 
to reconstruct the video 131 141 . 

In this paper, we developed a framework for processing 
surveillance video using compressive measurements. Our 
system is shown in Fig.[T] At the camera, the video captured 
by a surveillance camera is either acquired as, or transformed 
to, the low dimensional measurements by using random pro- 
jections. At the processing center, the frames of the video 
are reconstructed, and the moving objects are detected at the 
same time. 
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Fig. 1. The framework of the compressive sensing surveil- 
lance system. The video is compressed by using random pro- 
jections, and then transmitted to the processing center. The 
frames are reconstructed and the moving objects are detected 
simultaneously. 

Our method is based on three observations: 1). The back- 
ground is nearly static over a short period. Thus the back- 
ground images lie in a low dimensional subspace. 2). Natural 
images are sparse in a transform, such as tight wavelet frame, 
domain. 3). Generally the moving objects only occupies a 
small portion of the field of view of a surveillance camera. 
Based on these observations, we use a low rank model for 
background and a sparse model for moving objects. The re- 
construction of background and moving objects is performed 
by a low rank and sparse decomposition similar to |5 1|6|. 

In the low rank model of |6|, a large number of frames 
of video must be used in order to properly reconstruct the 
background because the low rank and sparse decomposition 
computes background frames as a low rank basis of the space 
spanned by the incoming video frames. This results in a long 
latency in the reconstruction. 

In this paper, we introduce an adaptive background model 
in which the low rank and sparse decomposition is performed 
with a small number of video frames. This significantly re- 
duces latency. In this adaptive method, the video frames are 
reconstructed by a few frames at a time. In each reconstruc- 
tion, the compressive measurements from a small number of 
video frames are used to perform the low rank and sparse de- 
composition which produces a set of background frames. The 
background frames are further processed and the results are 
used in the low rank and sparse decomposition for the next 
set of frames. Therefore, effectively, a large number of back- 
ground frames are participated (although not explicitly used) 



in the computation of the low rank and sparse decomposi- 
tion at each reconstruction, since the background frames from 
previous reconstructions are used. This makes it possible to 
accurately reconstruct background frames even with a small 
number of frames processed each time. The proposed method 
handles background changes very well because it is adaptive. 
Furthermore, the method reduces latency and computational 
complexity significantly. 

In the remaining parts of the paper, we first introduce pre- 
vious work related to our study. Then we introduce the frame- 
work of our video reconstruction method, followed by the 
background model and its adaption algorithm. The experi- 
mental results are given at the end. 

2. RELATED WORK 

Background subtraction. There has been extensive 
study on background subtraction from original videos |7|. 
The earliest background subtraction methods use frame dif- 
ference to detect foreground (H. Subsequent approaches 
aimed to model the variations and uncertainty in back- 
ground appearance, such as mixture of Gaussian [9J and 
non-parametric kernel density estimation ifTOl . Currently 
state-of-art background subtraction methods are able to get 
satisfactory results for stationary cameras. However, these 
methods cannot be applied to compressive measurements. 

Sparse reconstruction. Cevher et al. |11| casted the 
background subtraction as a sparse approximation problem 
and solved it based on convex optimization. Their method 
relies on a background model trained from pure background 
frames, which requires the prior knowledge of the back- 
ground. Jiang et al. |6| developed a low rank and sparse 
decomposition based approach to detect moving objects from 
a video. Their method solves all the frames at the same time, 
which results in a long latency and expensive computational 
cost. In contrast, the approach in this paper does not require a 
clean background for training, and it reconstructs background 
adaptively, with a small number of frames of video processed 
at a time. This reduces latency and complexity. 

3. LOW RANK AND SPARSE DECOMPOSITION 
3.1. Compressive measurements 

We consider a video consisting of m frames. Each frame has a 
total of n pixels. Let Xj G 3?"^ be a vector formed by concate- 
nating all pixels in frame j. Let X = [xi^...^Xm] e ^^^^ 
be a matrix containing m columns representing the m frames 
of the video. Let ^ G be a sensing matrix. The com- 

pressive measurements of X are defined as 

y = ^oX^[^xu..,,^Xm]. (1) 

where y G is a matrix of measurements, with a much 

smaller row dimension than X, i.e., r <C n. Each column of 
y contains r measurements of a frame of video. In our work, 
^ is composed of a set of r randomly permutated rows of 
Walsh-Hadamard matrix. 



3.2. Reconstruction 

Given the measurements y , we want to reconstruct the orig- 
inal video X. X can be decomposed into background matrix 
Xi and foreground matrix X2 : 

X = Xi+X2. (2) 

In above, Xi is a matrix each column of which is formed from 
the pixels of a background frame of the video. Similarly, X2 
is a matrix each column of which is formed from the pixels of 
a foreground frame of the video. Thus the objective is to solve 
Xi and X2, satisfying Eqs. ([T]) and ([2]). Apparently, this is 
an ill-posed problem which has infinite number of solutions. 
Therefore, we need some prior knowledge to find a proper 
solution. 

Low rank background. We assume the background im- 
ages have relative small changes over a short period, then the 
background matrix Xi should have a low rank |5|. We use 
the nuclear norm to measure the rank of this matrix, which is 
defined as the sum of single values cr^ : 

llXill* = tracei^XiXf) = ^a,. (3) 

i 

Sparsity in transformed domain. Previous work shows 
that natural images can be sparsely represented in a trans- 
formed space. We assume each background frame is sparse 
under transform Wi , and each foreground frame is sparse un- 
der a transform W2 |6|. We use the the /i-norm to measure 
the sparsity of the transformed background and foreground: 
1 1 Wi o Xi 1 1 1 , II W2 o X2 1 1 1 , where the /i-norm is defined as 

II^IIi = EENI' ^ = [^'^-]- 

i 3 

Sparse foreground. We also assume the foreground only 
occupies a small portion of a frame, and therefore, we can also 
use /i-norm as defined in Eq. ^ to measure the its sparsity: 

1 1X2 111. 

Given these prior assumptions, Xi and X2 can be recon- 
structed by solving the following optimization problem: 

(Xi,X2) = argmin /ii | |Xi 1 1* + /i2 1 1 W^i o Xi | |i (5) 

+/i3||I^2 0X2||i+/i4||X2||l 

such that y = ^ o {Xi ^ X2). 

In above, /ii, /i2, Ms and fi^ are nonnegative weights. Wi and 
W2 are sparsifying operators. In our system, we set Wi = 
W2 = W SiS the framelet transform lfT2l|[T3]| |[6ll. 

Eq. ([5]) is a convex problem, so standard convex optimiza- 
tion algorithms such as the interior point method 1 14] can be 
applied to find a solution. However, these standard methods 
are computationally expensive. Instead, as shown in 1 15 1, sin- 
gular value thresholding is more efficient for low rank decom- 
position. We apply the Augmented Lagrangian Alternating 
Direction (ALAD) algorithm introduced in Jiang et al. |6|. 



4. ADAPTIVE RECONSTRUCTION 

For the optimization problem described in |6|, a large number 
of frames (i.e., m > 100) are needed to find a proper solution, 
which leads to a high latency in the reconstruction. In addi- 
tion, the computational complexity of singular value thresh- 
olding is O(m^), which makes the algorithm highly compu- 
tationally expensive as the m becomes large. 

To reconstruct the background and foreground by solving 
Eq. ([5]), a large number of frames (i.e., the number of columns 
of Xi) are required. This is because the solution to Eq. ^ 
captures the low rank basis in the space spanned by Xi . If the 
number of frames is small, a moving object may not change 
significantly, thus would be captured as part of background. 
Only when a large number of frames, the solution to ([5]) would 
reconstruct a background as expected. This is the reason that 
a large number of frames must be used in |6 |. 

In this section, we introduce an adaptive method to re- 
duce both latency and complexity. In order to reduce la- 
tency, we want to process a small number of frames each 
time. However, in order to improve accuracy of reconstructed 
background, we still need a large number of columns to be 
present in the calculation of the nuclear norm 1 1 • 1 1 ^ . For this 
purpose, we augment Xi by the previously calculated back- 
ground frames. In other words, we replace | |Xi 1 1* in Eq. ^ 
by ||[M5,Xi]||* where M5 is a matrix which is a model of 
previously calculated background frames, see equations ^ 
and ([9]) below. 

The key idea of the paper is that M5, a representation of 
previously calculated background frames, is low dimensional 
and is computed adaptively as more frames are processed. 
Ml) may initially be an inaccurate approximation of the back- 
ground frames, but as the adaptation proceeds, M5 becomes 
progressively better representation of background frames. 
Furthermore, as background changes, M5 changes accord- 
ingly with the background. Therefore, this method not only 
reduces latency and complexity, but also allows the recon- 
structed background frames to adapt quickly to the changes 
in the background of the video. 

4.1. Augmented low rank decomposition 

We assume that a set of k background frames, bj, are already 
computed in processing the previous frames. We put them in 
a background matrix defined as: 



x, = [bu...,bk]e^' 



ixk 



The augmented background matrix Xi is formed by com- 
bining the previously computed background matrix X5 with 
the to-be-computed background Xi of m new frames: 

The use of the augmented matrix makes it possible to recon- 
struct Xi , X2 even if Xi has a very small number of columns. 
We now require Xi, instead of Xi, to have a small rank. 



Therefore, the problem to solve is same as Eq. ^ but with 
I |Xi 1 1 ^ replaced by | |Xi 1 1 * . By using Xi , there is no need for 
Xi to have a large number of columns. 

4.2. Low dimensional background model 

The computational complexity to optimize the low rank of 
Xi is 0(/c + m)^, which grows quickly as frames are contin- 
uously being processed. Therefore, we need to find a lower 
dimensional background model M5 G ^^^p from the com- 
puted background frames X5, for a new augmented matrix: 
[M5,Xi] e 3?^xb+^), where p <C A:. We need to find M5 
such that the nuclear norm of [M5, Xi] could approximate the 
nuclear norm of Xi , which leads to the following optimiza- 
tion problem: 



arg mm 

Mb 



11^1 



||[M5,Xi 



(6) 



We perform SVD decomposition of the background matrix 
X5, and form M5 as 



UDV^, 
UpDp. 



(7) 
(8) 



In Eqs. ([7]) and ([5]), I) is a diagonal matrix containing singular 
values of X5, and U, V are orthogonal matrices. Dp is a 
diagonal matrix formed by the p largest single values, and Up 
is consist of the first p columns of U. 

Now, replacing ||Xi||* by ||[M5,Xi]||* in Eq. ([S]), we 
have the low latency reconstruction given as: 



(Xi,X2) = argmin /ii||[M5,Xi 

^1 ,^2 



f /i2||WioXi||i (9) 

+/i3||1^2 0X2||i+/i4||X2||l, 



such that 



y = $o(Xi+X2). 



4.3. Optimization 

We now use the Augmented Lagrangian Alternating Direction 
(ALAD) algorithm to solve the problem in Eq. ([9]). The main 
difficulty is that the nuclear norm term involves an augmented 
matrix having both known columns and unknown columns. 
However, this can be handled by replacing the augmented 
matrix with a new variable. In addition, we introduce split- 
ting variables to make the objective function separable. We 
perform variable substitution as below: 

Zi = [M5,Xi], Z2 = WioXu Zs = W2oX2. (10) 

The ALAD optimization is shown in Alg. [T] More details 
about the optimization framework can be found in [6}. 

4.4. Updating the background model 

With the previously computed M5, Eq. ^ can be used to 
compute current background frames Xi by Alg.[T] Then the 
question is, how do we update M5 with current Xi to obtain 
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Fig. 2. Results of video reconstruction and background subtraction. Left: original frames; Middle: background and foreground 
reconstructed using the method of this paper; Right: Foreground masks generated from original video with GMM. 



Algorithm 1 Reconstructing Xi and X2 using ALAD. 

Initialize Af\ 
repeat 

Update Xi, X2, while fixing Zi and A^, 
Update Zi, while fixing Xi, X2 and A^, 
Update A^, while fixing Xi, X2 and Z^, 
until converge 



a new background model M^^^'^^ in order for us to solve Eq. 
^ to reconstruct the next set of frames? We use an approach 
to update M5 similar to the incremental SVD |16|. Given 
the SVD decomposition X5 ^ UpDpV^ , the decomposition 
of the augmented matrix with current background frames Xi 
can be used to update M5 as follows: 



Mr 



(new) 



= svd{[wbXb WaXi]), 

^ svd{[wbUpDpV^ WaXi]), 

svd{[wbUpDp WaXi]), 
= svd{[wbMb WaXi]). 

= (11) 



In 



0' 



{new) 



is a diagonal matrix formed by the p largest 



single values, and U^^^^^ is consist of the first p columns 
of IJ^^^^\ similar to those in Wa and wij are weights 
controlling the updating rate. 

It is important to point out that in the update ( pT) , the large 



matrix V in SVD will never need to be computed, represent- 
ing a significant reduction in complexity. 

5. EXPERIMENTS 

We perform experiments on three video clips from PETS2001 
database. The results are shown in Fig. |2] The first column 
shows the original frames. The second and third columns 
show backgrounds and foregrounds reconstructed by the 
method of this paper. We use 5% measurements for the first 
two examples, and 10% measurements in the last example. 
Median filters are used to post-process the results of our 
method to reduce the noises. The last column shows the 
foregrounds generated by applying Gaussian Mixture model 
(GMM) 19 1 . 

Fig. |2] demonstrates that the results of our method are 
comparable to GMM. But our method are performed by only 
using 5%-10% of the original data, while GMM uses 100%. 

6. CONCLUSION 

In this paper, we address the problem of reconstructing and 
analyzing surveillance videos from compressive measure- 
ments. We propose a method that simultaneously performs 
reconstruction and background subtraction with low latency. 
Our method is built on a background model, which is con- 
tinuously updated as new frames are reconstructed. The 
experiments have proved the effectiveness and efficiency of 
the proposed method. 
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